allowing thousand separators + fix the consumed indicator#111
Merged
CarlVerret merged 2 commits intomasterfrom Apr 27, 2026
Merged
allowing thousand separators + fix the consumed indicator#111CarlVerret merged 2 commits intomasterfrom
CarlVerret merged 2 commits intomasterfrom
Conversation
Collaborator
Author
|
@nietras Would that work? |
Contributor
@lemire I would think so, will try to test in Sep soon hopefully. Thanks :) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This addresses two issues. The first is a small bug in the way we report characters_consumed from TryParseNumber: in the UTF-16 path of FastFloatParser and the UTF-8 path of FastDoubleParser, we were advancing the start pointer past leading whitespace before parsing the number, then setting characters_consumed = pns.characters_consumed — which is measured from the new start, so the leading whitespace was silently dropped from the count. The two sibling paths already tracked a leading_spaces counter and did the addition; this PR brings the other two in line. While I was there I also turned the ThrowArgumentException() on empty input in FastFloatParser.TryParseNumber(byte*, ...) into a return false, since Try* should not throw.
The second change is motivated by issue #110, where FastFloatParser.ParseFloat("1,234,567.89") returns 1 while float.Parse("1,234,567.89", CultureInfo.InvariantCulture) returns 1234567.9. The reason is that the BCL’s Parse(string, IFormatProvider) overload defaults to NumberStyles.Float | NumberStyles.AllowThousands, and we did not honour AllowThousands at all. To make matching that behaviour possible without changing our defaults, this PR adds an optional thousands_separator parameter (default ',' for the char overloads and (byte)',' for the byte overloads) on every public Parse* / TryParse* entry point. When NumberStyles.AllowThousands is in styles and the thousands separator differs from the decimal separator, ParsedNumberString will accept a separator that lies strictly between two digits in the
integer part of the number; otherwise the previous behaviour is preserved. The European convention (thousands_separator: '.', decimal_separator: ',') works as you would expect. Default behaviour is unchanged: without AllowThousands, parsing "1,234" still stops at the comma and consumes one character.
The change to ParsedNumberString.ParseNumberString was deliberately small. I switched the integer-part loop from raw pointer arithmetic to a counter so that the reported digit count is correct in the presence of separators, and I extended the rare reparse path used when there are more than nineteen digits so that it skips separators while computing the truncated mantissa and exponent. The fractional, exponent, sign, and SIMD code paths are untouched.
Tests cover both fixes. There are new theories for characters_consumed correctness across the string, char*, and byte* overloads of both parsers, including a "walk through the buffer" test that would have caught the original bug. There are also tests for the new parameter: BCL-parity for typical inputs like 1,234,567.89, the European convention, no-op behaviour when AllowThousands is absent, the byte* path, and rejection of a leading separator. The 24-digit case (1,234,567,890,123,456,789,012,345) exercises the long-mantissa reparse path.
A short benchmark comparison (BenchmarkDotNet ShortRunJob, three iterations on canada/mesh/synthetic) shows no regression.
Note that this does not change the default styles for ParseFloat(string) to include AllowThousands — doing so would change the meaning of inputs that today parse to a partial result, which seemed too aggressive to do in the same change. The PR #110 test case will succeed once the caller opts in by passing NumberStyles.Float | NumberStyles.AllowThousands.